Goto

Collaborating Authors

 concave n-person game


Reviews: Bandit Learning in Concave N-Person Games

Neural Information Processing Systems

Context: It is a classic result that empirical frequencies of actions for players playing regret minimization algorithms converges to a coarse corr equilibrium. CCEs are not necessarily desirable solution concepts because they sometimes admit irrational behavior. For monotone games, it is known that the empirical frequencies converge converge to nash equilibrium for agents playing FTRL. Recently, Mertikopoulos et al proved that the sequence of plays for FTRL converges to nash for games -- they prove something more general that goes beyond concave potential games, in fact. This work considers that case when each agent can only observe bandit feedback.


Bandit Learning in Concave N-Person Games

Bravo, Mario, Leslie, David, Mertikopoulos, Panayotis

Neural Information Processing Systems

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.